dnadna.utils.config
Config file, serialization, and schema handling.
Functions
|
Loads a nested JSON-like data structure from a given filename. |
|
Load a JSON file as a |
|
Serializes a nested JSON-like data structure to a given filename. |
|
Serializes a (possibly nested) |
|
Save a dictionary into a json file. |
Classes
|
Represents the configuration for one of DNADNA's components, such as simulation and training configuration. |
|
Mix-in for classes that accept a |
|
A custom validator wrapping |
|
Like |
Exceptions
|
- class dnadna.utils.config.Config(*args, overrides=[], validate=False, schema=None, filename=None, resolve_inherits=False, resolve_overrides=True)[source]
Bases:
DeepChainMap
Represents the configuration for one of DNADNA’s components, such as simulation and training configuration.
This is a specialized subclass of
DeepChainMap
, with extra bells and whistles, in particular validating the configuration against a JSON Schema using a specialized schema validator with enhanced functionality over the defaultjsonschema.validate
functionality. SeeConfigValidator
for examples of the extra functionality provided by the custom schema validator used byConfig
.Another special feature of
Config
is to link multiple config files together with a special “inherit” property: If the value of a keyword in the config is adict
containing the “inherit” key, the value of that dict is loaded directly from the file pointed to by “inherit”. Any additional keys in the dict containing “inherit” override/extend the dict loaded from the inherit. See the Examples section below for explicit examples.- Parameters:
*args – One or more
dict
or other mapping types from which to instantiate theConfig
. In normal usage only onedict
should be passed. The support for multiple positional arguments is in order to supported the underlyingDeepChainMap
functionality. The reasonDeepChainMap
is used is to support the “inherit” functionality. Each inheritedConfig
is added to the tree ofDeepChainMap.maps
.- Keyword Arguments:
overrides (list) – (optional) – Same as in
DeepChainMap
.validate (bool or dict) – (optional) – Validate the given config. This checks two things: It validates that all inherits were successfully resolved. If a
schema
was specified, it then also validates the config against that schema (default:False
). If a non-emptydict
is given instead, validation is enabled, and the dict is passed as keyword arguments to theConfigValidator
class to control its behavior. This is used primarily for implementation purposes.schema (str or dict) – (optional) – The JSON Schema against which to validate the config. This can either be the name of one of the built-in schemas (see
Config.schemas
) or it can be a full JSON Schema object represented as adict
.filename (str or pathlib.Path) – (optional) – If the
Config
was read from a file (e.g. as withConfig.from_file
) this argument can be used to store the name of the file the config was read from. This should normally not be used directly, as it is normally set when usingConfig.from_file
resolve_inherits (bool or dict) – (optional) – If
True
, make sure all “inherit” keywords in the given config are resolved to their true values, by loading the inherited config files. Ifresolve_inherits
is a non-emptydict
this has the same effect isTrue
, except the dict is passed as keyword arguments to theload_dict
call that is used to read inherited config files. This argument is primarily for internal use and testing, and should not be used directly without knowing what you’re doing.
Examples
>>> from dnadna.utils.config import Config
Under the simplest usage, a
Config
is just a simple wrapper for adict
:>>> config = Config({'a': 1, 'b': 'c'}) >>> config['a'] 1 >>> config['b'] 'c'
However, when a
schema
is provided andvalidate=True
, the wrapped dict is validated against that schema. If validation succeeds, then theConfig
object is quietly instantiated with success:>>> schema = {'properties': {'a': {'type': 'integer'}}} >>> config = Config({'a': 1, 'b': 'c'}, validate=True, schema=schema)
But when validation fails, instantiating the
Config
will fail with ajsonschema.ValidationError
exception:>>> config = Config({'a': 'b', 'c': 'd'}, validate=True, schema=schema) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at 'a': 'b' is not of type 'integer'
Validation can also be delayed. If a schema was provided but
validate=False
, a later call toConfig.validate
will validate the instantiatedConfig
against that schema:>>> config = Config({'a': 'b', 'c': 'd'}, schema=schema) >>> config['a'] # successfully created despite violating the schema 'b' >>> config.validate() # validates against the previously provided schema Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at 'a': 'b' is not of type 'integer'
It is also possible to validate against one of the many built-in schemas given by:
>>> sorted(Config.schemas) ['dataset', 'dataset_formats/dnadna', 'definitions', 'nets/...', ..., 'param-set', ..., 'training', 'training-run']
For example:
>>> from dnadna.examples.one_event import DEFAULT_ONE_EVENT_CONFIG >>> config = Config(DEFAULT_ONE_EVENT_CONFIG.copy(), schema='simulation') >>> config.validate() is None True
You can also view the full values of these schemas, like:
>>> Config.schemas['simulation'] {'$schema': 'http://json-schema.org/draft-07/schema#', '$id': 'py-pkgdata:dnadna.schemas/simulation.yml', 'type': 'object', 'description': 'JSON Schema (YAML-formatted) for basic properties of a simulation...', ...}
Now we discuss inherits, which is a slightly complicated subject. The most basic usage is having a key in the config dictionary like
"key": {"inherit": "/path/to/inherited/file"}
. In this case the value associated with"key"
is replaced with the contents of the inherited file:>>> from dnadna.utils.config import save_dict >>> tmp = getfixture('tmp_path') # pytest specific >>> inherited = tmp / 'inherited.json' >>> save_dict({'foo': 'bar', 'baz': 'qux'}, inherited) >>> d = {'key': {'inherit': str(inherited)}}
In the original dict, the value of the
'key'
key is just as we specified:>>> d['key'] {'inherit': '...inherited.json'}
But when we instantiate
Config
from this, the value for'key'
will be transparently replaced with the contents ofinherited.json
:>>> config = Config(d, resolve_inherits=True) >>> config['key'] Config({'foo': 'bar', 'baz': 'qux'})
Inherits can also be nested, so if file A inherits from file B, and file B also contains inherits, the inherits in file B are resolved first, and so on. Demonstrating this is left as an exercise to the reader.
If a dict contains the
'inherit'
keyword, as well as other keys, first the inherit is resolved, but then the other keys in the dict override the inherited dict. This is made possible by the use ofDeepChainMap
:>>> d = {'key': { ... 'inherit': str(inherited), ... 'baz': 'quizling', ... 'fred': 'barney', ... }} >>> config = Config(d, resolve_inherits=True) >>> config['key'] Config({'baz': 'quizling', 'fred': 'barney', 'foo': 'bar'})
In the previous examples we used absolute filename paths with
'inherit'
, but it may also contain a relative path. If it contains a relative path there are two possibilities: If the parent config does not have a.filename
, then relative paths are simply resolved relative to the current working directory. This is not terribly useful because it might resolve to a different file depending on what directory you’re currently working in. More useful is that when the parent file does have a.filename
, then relative paths are considered relative to the directory containing the parent file.For example, let’s put a parent and child file in the same directory:
>>> parent_filename = tmp / 'parent.json' >>> child_filename = tmp / 'child.json' >>> save_dict({'a': 1}, child_filename) >>> save_dict({'foo': {'inherit': 'child.json'}, 'b': 2}, parent_filename)
As noted, both files are in the same directory:
>>> parent_filename.parent == child_filename.parent True
So we could specify just
{'inherit': 'child.json'}
, meaning inherit from the filechild.json
in the same directory as me:>>> parent = Config.from_file(parent_filename) >>> parent Config({'foo': {'a': 1}, 'b': 2}) >>> parent['foo'] Config({'a': 1})
This feature is particularly useful when there are multiple config files in a rigid directory structure, where one file is always going to be in the same position in the file hierarchy relative to the files it inherits from. So the relationship between the files is maintained even if the root of the directory structure is moved, e.g. between different machines.
- copy(folded=False)[source]
New
Config
or subclass with a new copy of maps[0] and refs to maps[1:].If
folded=True
, however, it returns a copy with all maps folded in so that there is only one map in the resulting copy; that is, it is equivalent toConfig(chain_map.dict())
.Also copies the filename.
- classmethod from_default(name, validate=True, schema=None, resolve_inherits=True, **kwargs)[source]
Load one of the default config files from
dnadna.DEFAULTS_DIR
.The filename extension may be omitted, so that
Config.from_default('simulation')
is the same asConfig.from_default('simulation.yml')
; as such that directory should not contain any conflicting filenames.Remaining keyword arguments are the same as those to
Config
, with the exception that theschema
argument may only be a string, since the use oflru_cache
means all arguments must be hashable.By default, the default config file is validated against the schema of the same name. For example, the
Config.from_default('dataset')
validates against the'dataset'
schema if it exists.Examples
>>> from dnadna.utils.config import Config >>> Config.from_default('dataset', schema='dataset') Config({'data_root': '.', 'dataset_name': 'generic', ...})
- classmethod from_file(filename, validate=True, schema=None, resolve_inherits=True, **kwargs)[source]
Read the
Config
from a supported JSON-like file, currently either a JSON or YAML file.- Parameters:
filename (str or pathlib.Path) – The filename to read from; currently should have either a
.json
,.yml
, or.yaml
extension in order to determine the correctly determine the file format. Other formats implemented by additional subclasses ofDictSerializer
may by supported in the future.- Keyword Arguments:
validate (bool or dict) – (optional) – Same as the
validate
option to the standardConfig
constructor (default:True
).schema (str or dict) – (optional) – Same as the
schema
option to the standardConfig
constructor.resolve_inherits (bool or dict) – (optional) – Same as the
resolve_inherits
option to the standardConfig
constructor.**kwargs – Additional keyword arguments are passed to the underlying
load_dict
call.
Examples
>>> from dnadna.utils.config import Config, save_dict >>> tmp = getfixture('tmp_path') # pytest specific >>> filename = tmp / 'config.json' >>> save_dict({'a': 1}, filename) >>> schema = {'properties': {'a': {'type': 'integer'}}} >>> config = Config.from_file(filename, schema=schema) >>> config['a'] 1 >>> str(config.filename) '...config.json' >>> schema['properties']['a']['type'] = 'string' >>> Config.from_file(filename, schema=schema) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in ".../config.json" at 'a': 1 is not of type 'string'
- property schemas
dict
mapping the names of built-in schemas to their values.Built-in schemas are loaded from any
.json
,.yml
, or.yaml
files in the directories listed inSCHEMA_DIRS
.Schemas in sub-directories of paths in
SCHEMA_DIRS
have their subdirectory path prepended to the name with/
.
- to_file(filename=None, **kwargs)[source]
Save the
Config
to the file given byfilename
.If the
Config
was read from a file and has a non-empty.filename
attribute, it will be written back to the same file by default.Additional
kwargs
depend on the file format and are passed to the appropriateDictSerializer
depending on the filename.This is equivalent to calling
save_dict
withself
.
- unresolve_inherits(config_dir=None, only=None)[source]
A sort of inversion of
Config.resolve_inherits
.This walks through all the chained mappings in this
Config
, and for any that have a non-empty.filename
, it is removed from the chained mappings and replaced with an entry in aninherit
property for the top-level mapping.This returns a new
Config
with all the relevant replacements made.Examples
>>> from dnadna.utils.config import save_dict >>> tmp = getfixture('tmp_path') # pytest specific >>> inherited = tmp / 'inherited.json' >>> save_dict({'foo': 'bar', 'baz': 'qux'}, inherited) >>> c = Config({'key': {'inherit': str(inherited)}}, ... resolve_inherits=True) ... >>> c Config({'key': {'foo': 'bar', 'baz': 'qux'}}) >>> c2 = c.unresolve_inherits() >>> c2 Config({'key': {'inherit': '...inherited.json'}})
- validate(schema=None, **validator_kwargs)[source]
Ensure that the configuration is valid:
All keys should be strings (for JSON-compatibility).
If a JSON schema is given, validate the config against that schema. The schema may either be a full JSON Schema given as a
dict
, or a key into theConfig.schemas
registry.
- exception dnadna.utils.config.ConfigError(config, msg, suffix='', path=())[source]
Bases:
ValueError
- class dnadna.utils.config.ConfigMixIn(config={}, validate=True)[source]
Bases:
object
Mix-in for classes that accept a
Config
object to provide part of their attribute namespace. Makes top-level keys in theConfig
object accessible as attributes on instances of the class.Includes optional validation of the
Config
against a schema by setting theconfig_schema
class attribute.The
config_schema
attribute may be either the name of a built-in schema, or a JSON Schema object (seeConfig.validate
).If
config_default
is provided, it provides default values for the config which can be overridden.Examples
>>> from dnadna.utils.config import Config, ConfigMixIn >>> class MyClass(ConfigMixIn): ... config_schema = { ... 'properties': {'a': {'type': 'integer'}} ... } ... ... def __init__(self, config, foo=1, validate=True): ... super().__init__(config, validate=validate) ... self.foo = foo ... >>> config = Config({'a': 1, 'b': 'b'}) >>> inst = MyClass(config) >>> inst.a 1 >>> inst.b 'b'
Assignment to attributes that are keys in the
Config
also update the underlyingConfig
. Such updates are ‘’not’’ validated against the schema:>>> inst.b = 'c' >>> inst.b 'c' >>> inst.config['b'] 'c'
Validation is performed upon instantiation unless passed
validate=False
:>>> config = Config({'a': 'a', 'b': 'b'}) >>> inst = MyClass(config) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at 'a': 'a' is not of type 'integer'
Note, if validation is disabled, then there is no guarantee the object will work properly if the config is invalid.
- config_attr = 'config'
The name of the attribute in which instances of this class store its
Config
. Typically this is just the.config
attribute.
- config_schema = None
The schema against which this class should validate its config
Config
by default.May be either the name of one of the built-in schemas (see
Config.schemas
) or a full schema object.
- classmethod from_config_file(filename, validate=True, **kwargs)[source]
Instantiate from a config file.
This method must be overridden if the subclass takes additional
__init__
arguments besidesconfig
andvalidate
.
- validate_config(config)[source]
Validate the config file with which this class was initialized.
By default it validates the config file against the associated
ConfigMixin.config_schema
schema, but this method may be overridden to add additional semantic validation to the config file that is not possible through the schema alone.
- class dnadna.utils.config.ConfigValidator(schema, *args, resolve_plugins=True, resolve_defaults=True, resolve_filenames=True, posixify_filenames=False, **kwargs)[source]
Bases:
object
A custom validator wrapping
jsonschema.Draft7Validator
class which supports special validation functionality for DNADNAConfig
objects:* Recognizes `Config` objects as JSON ``object`` s.
Adds new string formats:
filename
: When aConfig
is loaded from a file, any values in theConfig
that are recognized by the specified JSON schema as representing a filename are automatically resolved to absolute paths relative to the config file’s location. If the filename is already an absolute filename it is left alone. If the config does not have an associated filename, relative paths are treated as relative to the current working directory.filename!
: Same asfilename
without the!
, but a schema validation error is raised if the resulting filename does not exist on the filesystem.python-module
: The name of a Python module/package that should be importable via the standard import system (e.g.import dnadna
). If anImportError
is raised when trying to import this module a schema validation error is raised.
If the schema specifies defaults for any properties, those default values are filled into the
Config
if it is otherwise missing values for those properties.If the schema specifies an
"errorMsg"
property, custom error messages for validation errors can be provided and shown to users. SeeConfigValidator.validate
for examples.
Examples
>>> from dnadna.utils.config import ConfigValidator, Config >>> schema = { ... 'type': 'object', ... 'properties': { ... 'abspath': {'type': 'string', 'format': 'filename'}, ... 'relpath': {'type': 'string', 'format': 'filename'}, ... 'nonpath': {'type': 'string'}, ... 'has_default_1': {'type': 'string', 'default': 'a'}, ... 'has_default_2': {'type': 'string', 'default': 'b'} ... } ... } >>> validator = ConfigValidator(schema, posixify_filenames=True) >>> config = Config({ ... 'abspath': '/bar/baz/qux', ... 'relpath': 'fred', ... 'nonpath': 'barney', ... 'has_default_2': 'c' # override the default ... }, filename='/foo/bar/config.json') >>> validator.validate(config) is None True >>> config Config({'abspath': '/bar/baz/qux', 'relpath': '/foo/bar/fred', 'nonpath': 'barney', 'has_default_2': 'c', 'has_default_1': 'a'})
- best_match(errors, key=<staticmethod object>)[source]
Wraps the
jsonschema.exceptions.best_match
to returnCustomValidatonError
See therelevance_with_const_select
documentation above
- static relevance_with_const_select(error)[source]
This implements a custom heuristic for choose the best-match error with
dnadna.utils.config.ConfigValidator.
.It prioritizes
CustomValidatonError
s over other errors, so that a schema with customerrorMsg
properties can decide through that means which errors are most important. This can be especially useful when usingerrorMsg
in aoneOf
suite, where the custom error is perhaps more important than default reason given for why none of the sub-schemas matched. Here’s an example:>>> schema = { ... 'oneOf': [{ ... 'type': 'object', ... 'minProperties': 1, ... 'errorMsg': { ... 'minProperties': 'must have at least 1 entry' ... } ... }, { ... 'type': 'array', ... 'minItems': 1, ... 'errorMsg': { ... 'minItems': 'must have at least 1 entry' ... } ... }] ... }
This schema matches either an array or an object, which in either case must have a least one property (in the object case) or item (in the array case). Without this custom relevance function,
best_match
will just choose one of the errors from one of theoneOf
schemas which caused it not to match. In this case it happens to select the type error from the first sub-schema:>>> from jsonschema.exceptions import best_match >>> from dnadna.utils.config import ConfigValidator >>> validator = ConfigValidator(schema) >>> errors = validator.iter_errors([]) # try an empty list >>> best_match(errors) <ValidationError: '[] should be non-empty'>
Using this custom error ranking algorithm, the
CustomValidationError
will be preferred:>>> errors = validator.iter_errors([]) # try an empty list >>> validator.best_match(errors, ... key=ConfigValidator.relevance_with_const_select) <CustomValidationError: 'must have at least 1 entry'>
Otherwise it’s the same as the default heuristic with extra support for a common pattern where
oneOf
combined withconst
orenum
is used to select from a list of sub-schemas based on the value of a single property.For example:
>>> schema = { ... 'required': ['type'], ... 'oneOf': [{ ... 'properties': { ... 'type': {'const': 'regression'}, ... } ... }, { ... 'properties': { ... 'type': {'const': 'classification'}, ... 'classes': {'type': 'integer'}, ... }, ... 'required': ['classes'] ... }] ... } ...
The first schema in the
oneOf
list will match if and only if the document contains{'type': 'regression'}
and the second will match if and only if{'type': 'classification'}
with no ambiguity.In this case, when
type
matches a specific sub-schema, the more interesting error will be errors that occur within the sub-schema. But the default heuristics are such that it will think thetype
error is more interesting. For example:>>> import jsonschema >>> jsonschema.validate({'type': 'classification'}, schema) Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: 'regression' was expected ...
Here the error that matched the heuristic happens to be the the one that caused the first sub-schema to be skipped over, because
properties.type.const
did not match. But actual reason an error was raised at all was because the second sub-schema didn’t match either due to the required'classes'
property being missing. Under this use case, that would be the more interesting error. This heuristic solves that. In order to demonstrate this, we have to callbest_match
directly, sincejsonschema.validate
doesn’t have an option to pass down a different heuristic key:>>> from dnadna.utils.config import ConfigValidator >>> validator = ConfigValidator(schema) >>> errors = validator.iter_errors({'type': 'classification'}) >>> raise validator.best_match(errors, ... key=ConfigValidator.relevance_with_const_select) Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: 'classes' is a required property ...
This also supports a similar pattern (used by several plugins) where instead of
const
being used to select a specific sub-schema,enum
is used with a unique list of values (in factconst
is just a special case ofenum
with only one value). For example:>>> schema = { ... 'required': ['name'], ... 'oneOf': [{ ... 'properties': { ... 'name': {'enum': ['my-plugin', 'MyPlugin']}, ... } ... }, { ... 'properties': { ... 'name': {'enum': ['my-plugin2', 'MyPlugin2']}, ... 'x': {'type': 'integer'}, ... }, ... 'required': ['x'] ... }] ... } ... >>> validator = ConfigValidator(schema) >>> errors = validator.iter_errors({'name': 'my-plugin2'}) >>> raise validator.best_match(errors, ... key=ConfigValidator.relevance_with_const_select) Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: 'x' is a required property ...
- validate(config, *args, **kwargs)[source]
Validate the config against the schema and raise a
ConfigError
if validation fails.This can be enhanced by an extension to JSON-Schema, the
"errorMsg"
property which can be added to schemas. All JSON-Schema validation errors have a default error message which, while technically correct, may not tell the full story to the user. For example:>>> from dnadna.utils.config import ConfigValidator >>> schema = { ... 'type': 'object', ... 'properties': { ... 'loss_weight': { ... 'type': 'number', ... 'minimum': 0, ... 'maximum': 1 ... } ... } ... } ... >>> validator = ConfigValidator(schema) >>> validator.validate({'loss_weight': 2.0}) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at 'loss_weight': 2.0 is greater than the maximum of 1
However, if the schema has an
"errorMsg"
for"loss_weight"
we can give a more descriptive error. The value of"errorMsg"
may also include the following template variables:{property}
the name of the property being validated{value}
the value of the property being validated{validator}
the name of the validation being performed (e.g.'minimum'
{validator_value}
the value associated with the validator (e.g.1
for"minimum": 1
)
Let’s try adding a more descriptive error message for validation errors on
"loss_weight"
:>>> schema = { ... 'type': 'object', ... 'properties': { ... 'loss_weight': { ... 'type': 'number', ... 'minimum': 0, ... 'maximum': 1, ... 'errorMsg': ... '{property} must be a floating point value ' ... 'between 0.0 and 1.0 inclusive (got {value})' ... } ... } ... } ... >>> validator = ConfigValidator(schema) >>> validator.validate({'loss_weight': 2.0}) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at 'loss_weight': loss_weight must be a floating point value between 0.0 and 1.0 inclusive (got 2.0)
Note
In the above example it would have just as easy to explicitly write
loss_weight
in the error message instead of the template variable{property}
, but the latter case is more reusable (e.g. in definitions) and was used in this example just for illustration purposesThe
"errorMsg"
property may also be an object/dict, mapping the names of validators to error messages specific to a validator. If it contains the validator"default"
, the default message is used as a fallback for any other validators that do not have a specific error message. For example, the following schema requires an array of at least one unique string. It provides a custom error message forminItems
, but not for the other properties:>>> schema = { ... 'type': 'array', ... 'items': {'type': 'string'}, ... 'minItems': 1, ... 'uniqueItems': True, ... 'errorMsg': { ... 'default': ... 'must be an array of at least 1 unique string', ... 'minItems': ... 'array was empty (it must have at least 1 item)' ... } ... } ... >>> validator = ConfigValidator(schema) >>> validator.validate([1, 2]) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config at '1': 2 is not of type 'string' >>> validator.validate(['a', 'a']) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config: must be an array of at least 1 unique string >>> validator.validate([]) Traceback (most recent call last): ... dnadna.utils.config.ConfigError: error in config: must be an array of at least 1 unique string >>> validator.validate(['a', 'b', 'c'])
- class dnadna.utils.config.DeepChainMap(*maps, overrides={})[source]
Bases:
ChainMap
Like
collections.ChainMap
, but also automatically applies chaining recursively to nested dictionaries.For example, if two dictionaries in a
DeepChainMap
dc
each contain the key'c'
holding a dictionary, thendc['c']
returns aDeepChainMap
of those dictionaries. This follows the tree recursively until and unless the keyc
in one of the parent maps does not refer to a dictionary–this can have the effect of “clobbering” dicts higher up in the tree. It is also possible to prevent recursion at a specific key by providing overrides.- Parameters:
maps (list) – The sequence of mappings to chain together
- Keyword Arguments:
overrides (list) – (optional) – List of tuples giving the path to a key whose value should be overridden entirely by the mapping before it in the maps sequence. This is only relevant when the value is a dict: Rather than merging the two dicts into a
DeepChainMap
, the first dict overrides the value of the second.
- maps
The sequence of mappings that is walked when looking up keys in a
DeepChainMap
. The key is looked up first in.maps[0]
and then so on until found, or until the sequence is exhausted.- Type:
- overrides
List of paths into the mapping in the format
(key, subkey, ...)
providing which keys should be overridden by values earlier in the maps list (see examples).- Type:
Examples
>>> from dnadna.utils.config import DeepChainMap
Simple case; this is no different from a regular
collections.ChainMap
:>>> d = DeepChainMap({'a': 1, 'b': 2}, {'b': 3, 'd': 4}) >>> dict(d) {'a': 1, 'b': 2, 'd': 4}
But when some of the maps contain nested maps at the same key, those are now also chained. Compare with regular
collections.ChainMap
, in which the left-most dict under'b'
completely clobbers the dict in the right-hand'b'
:>>> from collections import ChainMap >>> left = {'a': 1, 'b': {'c': 2, 'd': 3}} >>> right = {'a': 2, 'b': {'c': 4, 'f': 5}, 'g': 6} >>> c = ChainMap(left, right) >>> dict(c) {'a': 1, 'b': {'c': 2, 'd': 3}, 'g': 6}
With
DeepChainMap
the dicts under'b'
are chained as well. TheDeepChainMap.dict
method can be used to recursively convert all nested dicts to a plaindict
:>>> d = DeepChainMap(left, right) >>> d.dict() {'a': 1, 'b': {'c': 2, 'd': 3, 'f': 5}, 'g': 6}
As mentioned above, nested chaining only continues so long as the dict in the chain also contains a dict a the same key; a non-dict value can in a sense “interrupt” the chain:
>>> d = DeepChainMap({'a': {'b': 2}}, {'a': {'c': 3}}, {'a': 5}, ... {'a': {'d': 4}}) >>> d.dict() {'a': {'b': 2, 'c': 3}}
You can see that the right-most
{'a': {'d': 4}}
is ignored since just before it{'a': 5}
does not have a dict at'a'
. However, if'a'
is missing at some point along the chain that is not a problem–the nested mapping continues to the next map in the chain:>>> d = DeepChainMap({'a': {'b': 2}}, {'a': {'c': 3}}, {}, ... {'a': {'d': 4}}) >>> d.dict() {'a': {'b': 2, 'c': 3, 'd': 4}}
You can also “interrupt” the chaining for dict values by providing the
overrides
argument; this is an advanced usage. In the first cased['a']['b']
is merged from both dicts:>>> d = DeepChainMap({'a': {'b': {'c': 2}}, 'w': 'w'}, ... {'a': {'b': {'d': 3}}, 'x': 'x'}) >>> d.dict() {'a': {'b': {'c': 2, 'd': 3}}, 'w': 'w', 'x': 'x'}
But by passing
overrides=('a', 'b')
merging stops short atd['a']
:>>> d = DeepChainMap({'a': {'b': {'c': 2}, 'w': 'w'}}, ... {'a': {'b': {'d': 3}, 'x': 'x'}}, ... overrides=[('a', 'b')]) >>> d.dict() {'a': {'b': {'c': 2}, 'w': 'w', 'x': 'x'}}
Here you can see that the dicts keyed by
['a']['b']
were not merged, and only the first one was kept.- copy(folded=False)[source]
New DeepChainMap or subclass with a new copy of maps[0] and refs to maps[1:].
If
folded=True
, however, it returns a copy with all maps folded in so that there is only one map in the resulting copy; that is, it is equivalent toDeepChainMap(chain_map.dict())
.
- dict(cls=<class 'dict'>)[source]
Recursively convert self and all nested mappings to a plain
dict
, or the type specified by thecls
argument.
- get_owner(key, parent=False)[source]
Given a key, return the first nested map that contains that key.
Examples
>>> from dnadna.utils.config import DeepChainMap >>> cm = DeepChainMap({'a': 1, 'b': 2}, {'b': 3, 'c': 4}) >>> cm.get_owner('b') {'a': 1, 'b': 2} >>> cm.get_owner('c') {'b': 3, 'c': 4}
If
parent=True
, in the case of nestedDeepChainMap
s, it only returns the “inner-most”DeepChainMap
containing the key. For example:>>> inner = DeepChainMap({'c': 3}, {'d': 4}) >>> outer = DeepChainMap({'a': 1, 'b': 2}, inner) >>> outer.get_owner('d') {'d': 4} >>> outer.get_owner('d', parent=True) DeepChainMap({'c': 3}, {'d': 4})
- dnadna.utils.config.load_dict(filename, **kwargs)[source]
Loads a nested JSON-like data structure from a given filename.
May support multiple serialization formats, determined primarily by the filename extension. Currently supports:
JSON (
.json
)YAML (
.yml
or.yaml
)
- dnadna.utils.config.load_dict_from_json(filepath)[source]
Load a JSON file as a
dict
.Shortcut for
load
.- Parameters:
filepath (str) – filepath to the json file
- dnadna.utils.config.save_dict(obj, filename, **kwargs)[source]
Serializes a nested JSON-like data structure to a given filename.
The serialization format is determined by the filename.
May support multiple serialization formats, determined primarily by the filename extension. Currently supports:
JSON (
.json
)YAML (
.yml
or.yaml
)
- dnadna.utils.config.save_dict_annotated(obj, filename, schema=None, validate=False, serializer=<class 'dnadna.utils.serializers.YAMLSerializer'>, **kwargs)[source]
Serializes a (possibly nested)
dict
to YAML, after (optionally) validating it against the givenschema
, and producing comments from the title/description keywords in the schema.- Parameters:
obj (dict,
Config
) – The dict-like object to save.filename (str,
pathlib.Path
, file-like) – A filename orpathlib.Path
, or open file-like object to which to stream the output.
- Keyword Arguments:
schema (str or dict) – (optional) – A schema given either as the name of a schema in the schema registry, or a full schema object given as a dict. If omitted, this is equivalent to
save_dict
to a YAML file, and no annotation is added.validate (bool) – (optional) – Validate the given object against the schema before writing it (default: False). This can be used in case the object is not already known to be valid against the schema.
serializer (
DictSerializer
) – (optional) – Specify theDictSerializer
to use; normally this should be theYAMLSerializer
since it’s the only one (currently) which supports comments.
Examples
>>> from io import StringIO >>> from dnadna.utils.config import save_dict_annotated >>> schema = { ... 'description': 'file description', ... 'properties': { ... 'a': {'type': 'string', 'title': 'a', ... 'description': 'a description'}, ... 'b': {'type': 'integer', 'description': 'b description'}, ... 'c': { ... 'type': 'object', ... 'description': 'c description', ... 'properties': { ... 'd': {'description': 'd description'}, ... 'e': {'description': 'e description'} ... } ... }, ... 'f': {'description': 'f description'} ... } ... } ... >>> d = {'a': 'foo', 'b': 2, 'c': {'d': 4, 'e': 5}, 'f': 6} >>> out = StringIO() >>> save_dict_annotated(d, out, schema=schema, validate=True, indent=4) >>> print(out.getvalue()) # file description # a # # a description a: foo # b description b: 2 # c description c: # d description d: 4 # e description e: 5 # f description f: 6